{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Vector stores\n",
    "One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.\n",
    "\n",
    "存储和搜索非结构化数据的最常见方法之一是嵌入它并存储生成的嵌入向量，然后在查询时嵌入非结构化查询并检索与嵌入式查询“最相似”的嵌入向量。矢量存储负责存储嵌入数据并为您执行矢量搜索。\n",
    "\n",
    "<img src=\"https://python.langchain.com/v0.1/assets/images/vector_stores-125d1675d58cfb46ce9054c9019fea72.jpg\" width=\"600\"/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import TextLoader\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "from langchain_text_splitters import CharacterTextSplitter\n",
    "from langchain_chroma import Chroma\n",
    "\n",
    "# Load the document, split it into chunks, embed each chunk and load it into the vector store.\n",
    "raw_documents = TextLoader('../data/meow.txt').load()\n",
    "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "documents = text_splitter.split_documents(raw_documents)\n",
    "db = Chroma.from_documents(documents, OpenAIEmbeddings())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Similarity search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "meow meow🐱 \n",
      " meow meow🐱 \n",
      " meow😻😻\n",
      "\n",
      " sss\n",
      " aaa\n"
     ]
    }
   ],
   "source": [
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "docs = db.similarity_search(query)\n",
    "print(docs[0].page_content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Similarity search by vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "meow meow🐱 \n",
      " meow meow🐱 \n",
      " meow😻😻\n",
      "\n",
      " sss\n",
      " aaa\n"
     ]
    }
   ],
   "source": [
    "embedding_vector = OpenAIEmbeddings().embed_query(query)\n",
    "docs = db.similarity_search_by_vector(embedding_vector)\n",
    "print(docs[0].page_content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "langchain0_1",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
